A time series might be decomposed, in general terms, in three different components:
A seasonal component \(S_t\)
A trend-cycle component \(T_t\)
Some authors separate the trend and cycle component. Here we consider trend-cycle because the usual decomposition algorithms extract them together, not separately.
The remainder component \(R_t\)
Typically we distinguish between two types of decomposition schemes:
An additive scheme: the variation around the trend-cycle of the seasonal pattern does NOT vary with the level of the time series (with the trend component value).
A multiplicative scheme: the variation around the trend-cycle of the seasonal pattern appears to be proportional to the level of the time series (to the value of the trend component).
We will work with the number of persons employed in retail in the US as shown in Figure 3.5. We can see the monthly number of persons in thousands employed in the retail sector across the US since 1990:
us_retail_employment <- us_employment %>%filter(year(Month) >=1990, Title =="Retail Trade") %>%select(-Series_ID)autoplot(us_retail_employment, Employed) +labs(y ="Persons (thousands)",title ="Total employment in US retail")
With this code we can obtain the components using STL decomposition (more on that later in the chapter, donot worry too much about the code now):
dcmp_components <- us_retail_employment %>%# 1. Define and fit the decomposition modelmodel(stl =STL(Employed)) %>%# 2. Extracts the components out of the model. Note that the function# components is not applicable to every type of model. It is applicable to# decomposition models and ETS models (later in the subject).components(dcmp)
dcmp_components %>%# 1. Format dataframe to tsibbleas_tsibble() %>%# 2. Plot the decomposition using ggplotautoplot(Employed, colour="gray") +geom_line(aes(y=trend), colour ="#D55E00") +labs(y ="Persons (thousands)",title ="Total employment in US retail" )
Using autoplot, we can have a look at all the components at once
dcmp_components %>%autoplot()
Example 2 - Multiplicative scheme:
Let us resort again to the monthly medicare Australian prescription data, because it shows a clear multiplicative scheme:
a10 <-# 1. Dataframe containing monthly medicare expenses per type of drug PBS %>%# 2. Filter for antidiabetic drigsfilter(ATC2 =="A10") %>%# 3. select subset of columnsselect(Month, Concession, Type, Cost) %>%# 4. Add the total expenditure per monthindex_by(Month) %>%summarise(TotalC =sum(Cost)) %>%# 5. Scale to millionsmutate(Cost = TotalC /1e6)# 6. Plotautoplot(a10, Cost) +labs(y ="$ (millions)",title ="Australian antidiabetic drug sales")
# Examine the dataseta10
# A tsibble: 204 x 3 [1M]
Month TotalC Cost
<mth> <dbl> <dbl>
1 1991 Jul 3526591 3.53
2 1991 Aug 3180891 3.18
3 1991 Sep 3252221 3.25
4 1991 Oct 3611003 3.61
5 1991 Nov 3565869 3.57
6 1991 Dec 4306371 4.31
7 1992 Jan 5088335 5.09
8 1992 Feb 2814520 2.81
9 1992 Mar 2985811 2.99
10 1992 Apr 3204780 3.20
# ℹ 194 more rows
With this code we can obtain the components using X11 decomposition. We use this method because the series is multiplicative (more about that later in the chapter, do not worry too much about it now):
x11_dcmp <- a10 %>%# 1. Define and fit the decomposition modelmodel(x11 =X_13ARIMA_SEATS(Cost ~x11())) %>%# 2. Extracts the components out of the model. Note that the function# components is not applicable to every type of model. It is applicable to# decomposition models and ETS models (later in the subject).components()x11_dcmp
Again we can depict the components using autoplot:
x11_dcmp %>%autoplot()
Multiplicative scheme under different transformations
Most of the time multiplicative schemes can be made more “additive-like” with certain mathematical transformations. We will study these in a separate notebook, but let us examine them in a cursory manner here:
Let us look at the effect of applying different transformations of increasing strength to stabilize variations. The purpose is to turn the multiplicative scheme into an additive scheme.
Specifically we will apply a square root, a cubic root, a log and an inverse transformation:
a10 <- a10 %>%# Define different versions of the transformed variablemutate(sqrt_cost =sqrt(Cost),cbrt_cost = Cost^(1/3),log_cost =log(Cost),inv_cost =-1/Cost )a10
Now let us explore the effect of each transformation on the data (please excuse the spaghetti code style, in a professional environment dedicated functions or at least a for loop should have been written to avoid code repetition, also known as spaghetti code):
In conclusion, in terms of the strength of these transformations to turn a multiplicative scheme into an additive scheme, we can state the following x \[
\sqrt{x} < \sqrt[3]{x} < log(x) < \frac{1}{x}
\]
Further visual examples of additive vs. multiplicative
The following image further clarifies the basic difference between an additive and a multiplicative time series. It has been borrowed from reference [2].
Multiplicative vs additive schemes. From reference [2].
In reference [2], there is an interesting interactive game where you can practice to recognize visual cues in additive and multiplicative schemes. I strongly encourage you to spend some minutes doing this.
Automating the identification of additive vs multiplicative schemes
Visual inspection is a first approach to assessing the scheme type and it is interesting for developing your intuition as analysts. However, when you are faced with batches of time series you need to process, it is definitely not the best approach. There are different ways in which the type of scheme could be evaluated.
One way of systematically assessing whether we should consider an additive or multiplicative scheme is to:
Decompose the series using a method that is appropriate for additive schemes and a method appropriate for a multiplicative scheme.
Assess the goodness of fit of each method. For example, assessing the amount of autocorrelation in the remainder of each decomposition.
Another possibility would be to:
Fit an exponential smoothing with an additive scheme
Fit an exponential smoothing with a multiplicative scheme.
Compare their Akaike Information Criteria (\(AIC\) or \(AIC_c\)) to see which fits best
By the end of the course you should be able to understand all this completely.
Detrended and Seasonally Adjusted time series
Detrended time series
If \(T_t\) is the trend-cycle component of a time series, the detrended time series \(D_t\) is computed by removing the trend component from the time series.
This is done differently depending on whether the time series is additive or multiplicative
If \(S_t\) is the seasonal component, the seasonally-adjusted time series \(A_t\) is computed by removing the seasonal component from the time series (hence seasonal adjusted series).
This is done differently depending on whether the time series is additive or multiplicative:
It contains the remainder as well as the trend-cycle component. Therefore it is not “smooth”. Its short term fluctuations due to the remainder can be misleading when trying to predict a trend from it.
Technically, the slope of the tangent line to the curve (the derivative) changes abruptly every time-step due to these short term fluctuations. The trend is a smoother version with a smoother derivative and is therefore better to make an analysis of the overall direction of the time series.
If you want to look for turning poins and interpret changes in direction, it is best to use the trend cycle component rather than the seasonally adjusted data
The seasonally adjusted data is useful if the variation due to seasonality is not of primary interest.
Example: unemployment data are usually seasonally adjusted in order to highlight variation due to the underlying state of the economy rather than the seasonal variation
Example 1: additive scheme
Further up in this notebook we looked at the us_retail_employment time series and concluded it followed an additive scheme:
us_retail_employment %>%autoplot()
Plot variable not specified, automatically selected `.vars = Employed`
For this example we are going to fit a classical_decomoposition model. The specifics of this syntax will be explained in later sessions, but they are fairly straight forward.
classical_dcmp <- us_retail_employment %>%# 1. Fit the modelmodel(dcmp =classical_decomposition(Employed, type ="additive") ) %>%# 2. Extract the componentscomponents()classical_dcmp
# A dable: 357 x 7 [1M]
# Key: .model [1]
# : Employed = trend + seasonal + random
.model Month Employed trend seasonal random season_adjust
<chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dcmp 1990 Jan 13256. NA -75.5 NA 13331.
2 dcmp 1990 Feb 12966. NA -273. NA 13239.
3 dcmp 1990 Mar 12938. NA -253. NA 13191.
4 dcmp 1990 Apr 13012. NA -190. NA 13203.
5 dcmp 1990 May 13108. NA -88.9 NA 13197.
6 dcmp 1990 Jun 13183. NA -10.4 NA 13193.
7 dcmp 1990 Jul 13170. 13178. -13.3 5.65 13183.
8 dcmp 1990 Aug 13160. 13161. -9.99 8.80 13169.
9 dcmp 1990 Sep 13113. 13141. -87.4 59.9 13201.
10 dcmp 1990 Oct 13185. 13117. 34.6 33.8 13151.
# ℹ 347 more rows
The resulting dataframe contains the following columns:
trend: trend component
seasonal: seasonal component
random: random component
season_adjust: seasonally adjusted component
Let us compute the detrended time series. Because the scheme is additive, we simply subtract the estimate of the trend from the original time series:
classical_dcmp <- classical_dcmp %>%# Compute new column containing the detrended componentmutate(detrended = Employed - trend )# Depict the result:classical_dcmp %>%autoplot(detrended)
The detrended time series contains both the effect of the remainder and the seasonal component.
To compute the seasonally adjusted time series we simply remove the seasonal component from the time series. Since the scheme is additive, we attain this by subtraction. Note that the decomposition already provided a season_adjust column. We are going to compute it manually and then check that it leads to the same result:
classical_dcmp <- classical_dcmp %>%# Compute new column containing the detrended componentmutate(season_adjust_manual = Employed - seasonal )# Depict the result along with the original time seriesclassical_dcmp %>%as_tsibble() %>%autoplot(Employed, colour="gray") +geom_line(aes(y=season_adjust_manual), colour ="#D55E00") +labs(y ="Persons (thousands)",title ="Total employment in US retail" )
Note how the seasonally adjusted time series is different than the trend. It contains the trend plus the random component. You can recognize it in the short term fluctuations of the seasonally adjusted component. Compare it with the figure of trend, just below:
classical_dcmp %>%as_tsibble() %>%autoplot(Employed, colour="gray") +geom_line(aes(y=trend), colour ="#D55E00") +labs(y ="Persons (thousands)",title ="Total employment in US retail" )
Comparing both, we see that the seasonally adjusted time series is not “smooth”. “Downturns” or “upturms” can be misleading. If you want to look for turning poins and interpret changes in direction, it is best to use the trend cycle component rather than the seasonally adjusted data
checking our manual computation with ´all.equal´
Finally, we can check that our manually generated seasonally adjusted series matches the seasonally adjusted series computed when using classical_decomposition(). The function all.equal compares two vectors element by element. It can be used as follows for the case at hand
For the above code to evaluate to TRUE, sometimes you will require to round some of the components to a specific number of decimals for this to evaluate to true. This has to do with floating point inaccuracies etc… which you should have studied as part of computer science.
Example 2: multiplicative scheme
Further up in this notebook we have computed the dataset a10, corresponding to the expenses in anti-diabetic medications in Australia’s publich healthcare system. We concluded that this time series followed a multiplicative scheme:
a10 %>%autoplot()
Plot variable not specified, automatically selected `.vars = TotalC`
For this example we are going to fit a classical_decomoposition model. The specifics of this syntax will be explained in later sessions, but they are fairly straight forward.
classical_dcmp <- a10 %>%# 1. Fit the modelmodel(dcmp =classical_decomposition(Cost, type ="multiplicative") ) %>%# 2. Extract the componentscomponents()classical_dcmp
# A dable: 204 x 7 [1M]
# Key: .model [1]
# : Cost = trend * seasonal * random
.model Month Cost trend seasonal random season_adjust
<chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
1 dcmp 1991 Jul 3.53 NA 0.979 NA 3.60
2 dcmp 1991 Aug 3.18 NA 0.990 NA 3.21
3 dcmp 1991 Sep 3.25 NA 0.986 NA 3.30
4 dcmp 1991 Oct 3.61 NA 1.05 NA 3.45
5 dcmp 1991 Nov 3.57 NA 1.08 NA 3.32
6 dcmp 1991 Dec 4.31 NA 1.23 NA 3.50
7 dcmp 1992 Jan 5.09 3.50 1.33 1.09 3.82
8 dcmp 1992 Feb 2.81 3.53 0.780 1.02 3.61
9 dcmp 1992 Mar 2.99 3.57 0.876 0.956 3.41
10 dcmp 1992 Apr 3.20 3.60 0.858 1.04 3.74
# ℹ 194 more rows
The resulting dataframe contains the following columns:
trend: trend component
seasonal: seasonal component
random: random component
season_adjust: seasonally adjusted component
To compute the de-trended time series, since this is a multiplicative scheme, we need to divide by the trend estimate:
classical_dcmp <- classical_dcmp %>%# Compute new column containing the detrended componentmutate(detrended = Cost / trend )# Depict the result:classical_dcmp %>%autoplot(detrended)
The detrended component contains both the effect of the remainder and the seasonal component.
To compute the seasonally adjusted time series we simply remove the seasonal component from the time series. Since the scheme is multiplicative, we attain this by division. Note that the decomposition already provided a season_adjust column. We are going to compute it manually and then check that it leads to the same result:
classical_dcmp <- classical_dcmp %>%# Compute new column containing the detrended componentmutate(season_adjust_manual = Cost / seasonal )# Depict the result along with the original time seriesclassical_dcmp %>%as_tsibble() %>%autoplot(Cost, colour="gray") +geom_line(aes(y=season_adjust_manual), colour ="#D55E00")
Note how the seasonally adjusted component is different than the trend. It contains the trend plus the random component. You can recognize it in the short term fluctuations of the seasonally adjusted component. Compare it with the figure of trend, just below:
For the above code to evaluate to TRUE, sometimes you will require to round some of the components to a specific number of decimals for this to evaluate to true. This has to do with floating point inaccuracies etc… which you should have studied as part of computer science.
Mixed schemes
In many occasions schemes cannot be unequivocally classified as additive or multiplicative. We will deal with such cases when studying the box-cox transformation.
A common scheme that is not fully additive or multiplicative is a so called mixed scheme of the form:
Trend-cycle and seasonality components relate in a multiplicative manner.
The remainder component is super-imposed in an additive scheme.
This particular scheme is relevant when:
The irregular component of the oscillations \(R_t\) does not grow with the level of the time series (and therefore is super-imposed in an additive manner)
The seasonal component grows with the level of the time series, and therefore is combined with the trend in a multiplicative manner.